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DETAILED ACTION 
Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in 
public use or on sale in this country, more than one year prior to the date of application for patent in the 
United States. 

2. Claims 1 and 6 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Sharma et al (5,862,519) 

The U.S. patent of Sharma et al. teach computer-based apparatus (system) and 
hence the methods and computer code necessary to implement this system are 
i novit a bly- part of their teachings. 
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Limitations 


Sharma et al. 


• Receiving frames of acoustic data 


Computuig Spectral Variation r unction (ovr), 


• Determining cepstral coefficients for 


which is based on the Euclidian norm of delta 


each of the received frames of acoustic 


cepstral coefficients of each individual frame (Col. 


data 


6, lines 32-38). This process inherently 




presupposes fragmentation of acoustic data into 




frames and computing cepstral coefficients for 




each frame. 


• Segmenting the received frames of 


Segmentation is performed based on the values of 


acoustic data based on the determined 


Kmax and Kmin (elems. 20, 22, FIG. 1 and Col. 


CepSUal CUCIllClCIllb. 


7, lines 23-28). K max (maximum number of 




segments for each frame ) is computed using SVF 




(Col. 6, line 29-31), which itself is a function of 




cepstral coefficients (Eq. 2). Therefore, 




segmentation of each frame is ultimately 




performed based on the cepstral coefficients. 



3. Claims 1 0 and 1 2-1 4,15,1 8-1 9, 20, 22-25, 28, 29 are rejected under 35 
U.S.C. 102(b) as being anticipated by Juang et al. (5,812,972) 



The U.S. patent of Juang et al. teach computer-based apparatus (system) and 
hence the methods and computer code necessary to implement this system are 
inevitably part of their teachings. 
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Claim# 


Limitations 


Juann pt al 

UUUI IU d Gil. 


10,15 


Receiving frames of acoustic data 
Determining segmentation information 


(Elem. 315, FIG. 3), which corresponds to a frame of 
acoustic data. 




corresponding to the received frames of 


(Elem. 330, FIG. 3) 




acoustic data 






Determining at least one weighting parameter 


(Elem. 360, FIG. 3), where W(i) is a weighting 




based on the determined segmentation 


factor based on the confidence level of the current 




information 


segmentation vector. (Col. 8, lines 55-59) 




Recognizing patterns in the received frames of 


(Elem. 355, FIG. 3) 




acoustic data using the at least one weighting 






parameter 




12,17 


Determining , based on the frames of acoustic 


(Elem. 327, FIG. 3) uses HMM to determine the next 




data, recognition hypothesis scores using a 


likely state (Col. 7, lines 44-48) 




Hidden Markov Model 




13, 14, 


Modifying the recognition hypothesis scores 


As it can be seen from FIG. 3, readjusted scores are 


19 


based on the at least one weighting parameter 


fed from block 360 back into block 325. Thus the 




ana 


HMM and corresponding scores are updated using 




the recognizing patterns in the frames of 


the weighting factor Wi (elem. 360, FIG. 3) which 




acoustic data further uses the modified 


reflects the level of confidence in the current model 




recognition hypothesis scores 


Therefore, the final recognition result (Elem. 355, 
FIG. 3) will incorporate the modification through 
weighting factor W, 
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20, 29 


Receiving frames of acoustic data 


(Elem. 315, FIG. 3), which corresponds to a frame of 
acoustic data. 




Determining first segmentation information 


For each sequential frame, modified Observation 




corresponding to the received frames of 


vector O"(0> is computed along with segmentation 




acoustic data and second segmentation 


vector A(i) (Col. 8, lines 7-17) 




information corresponding to the received 






frames of acoustic data. 






Determining at least one weighting parameter 


(Elem. 360, FIG. 3), where W(i) is a weighting 




based on the determined second segmentation 


factor based on the confidence level of the current 




information 


segmentation vector. (Col. 8, lines 55-59) 




Recognizing patterns in the received frames of 


(Elem. 355, FIG. 3) 




acoustic data using the at least one weighting 






parameter. 




22 


Comparing the determined first and second 
segmentation information 


Col. 8, lines 45 - 48 
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23 


The recognizing patterns in the frames of 
acoustic data is based on the comparison of the 
first and second segmentation information 


As it can be seen from FIG. 3, readjusted scores are 
fed from block 360 back into block 325. Thus the 
HMM and corresponding scores are updated using 
the weighting factor Wi (elem. 360, FIG. 3) which 
reflects the level of confidence in the current model. 
Therefore, the final recognition result (Elem. 355, 
FIG. 3) will incorporate the modification through 
weighting factor W which includes comparison of 
O"(0 and A(i) 


24 


Determining , based on the frames of acoustic 
data, recognition hypothesis scores using a 
Hidden Markov Model 


(Elem. 327, FIG. 3) uses HMM to determine the next 
likely state (Col. 7, lines 44-48) 


25,28 


Modifying the recognition hypothesis scores 
based on the at least one weighting parameter 
and 

the recognizing patterns in the frames of 
acoustic data further uses the modified 
recognition hypothesis scores 


As it can be seen from FIG, 3, readjusted scores are 
fed from block 360 back into block 325. Thus the 
HMM and corresponding scores are updated using 
the weighting factor Wi (elem. 360, FIG. 3) which 
reflects the level of confidence in the current model. 
Therefore, the final recognition result (Elem. 355, 
FIG. 3) will incorporate the modification through 
weighting factor W. 



Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
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invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 2-5, 7-9, 30, are rejected under 35 U.S.C. 103(a) as being unpatentable 

over Sharma et al. 

The U.S. patent of Sharma et al. teach computer-based apparatus (system) and 
hence the methods and computer code, such as data structures, necessary to 
implement this system are i n e v it a b l y part of their teachings. 

As per claims 2-4, Sharma et al. do not disclose determining the number of 
peaks in cepstral coefficients for each frame and then segmenting the frames based on 
the comparison of the number of peaks in each of the seguential frames 
(Peak N Difference) . The applicant discloses that the reason for tracking the variation 
in the number of peaks is to identify frames having phoneme boundaries, where the 
number of cepstral coefficient peaks change rapidly [Disclosure, Page. 6, lines 15-16]. 

Similarly, Sharma et al. applies this principle for computing the maximum number 
of peaks (Kmax). Sharma et al. discloses Spectral Variation (SVF) function which is 
computed based on the time variation of cepstral coefficients for each frame. (Col. 6, 
lines 32-38). To identify segments based on phoneme boundaries, Sharma tracks the 
peaks in SVF, because SVF exhibits peaks at boundaries where characteristics of 
speech change rapidly (Col. 6, lines 63-67). SVF tracks the frame-to-frame changes 
between the corresponding cepstral coefficients within individual frames (See Eq. 2) 
and thus, changes in the number of peaks would also affect SVF, so that SVF would 



Application/Control Number: 09/826,715 Page 8 

Art Unit: 2655 

track the frames identified by the applicant's method as having high 
"Peak_N_Difference". 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Sharma to use the difference in the number of 
cepstral peaks instead of SVF, because these methods track the same cepstral 
properties in a similar fashion. Computing the difference in the number of peaks is a 
variation of SVF, and while not being as exact as SVF, it has the advantage of 
computational simplicity. 

As per claim 5, Sharma et al. discloses receiving frames of acoustic data and 
computing cepstral coefficients for each frame - Sharma et al. computes Spectral 
Variation Function (SVF), which is based on the Euclidian norm of delta cepstral 
coefficients of each individual frame (Col. 6, lines 32-38). This process inherently 
presupposes fragmentation of acoustic data into frames and computing cepstral 
coefficients for each frame. 

Sharma et al. do not disclose a processing unit that determines the number of 
peaks in cepstral coefficients for each frame and then segments the frames based on 
the comparison of the number of peaks in each of the seguential frames 

However, Sharma et al.'s Spectral Variation (SVF) function is computed based 
on the time variation of cepstral coefficients for each frame. (Col. 6, lines 32-38). To 
identify segments based on phoneme boundaries, Sharma tracks the peaks in SVF, 
because SVF exhibits peaks at boundaries where characteristics of speech change 
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rapidly (Col. 6, lines 63-67). SVF tracks the frame-to-frame changes between the 
corresponding cepstral coefficients within individual frames (See Eq. 2) and thus, 
changes in the number of peaks would also affect SVF, so that SVF would track the 
frames identified by the applicant's method as having high "Peak_N_Difference". 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Sharma et al. to use the difference in the number 
of cepstral peaks instead of SVF, because these methods track the same cepstral 
properties in a similar fashion. Computing the difference in the number of peaks is a 
variation of SVF, and while not being as exact as SVF, it has the advantage of 
computational simplicity. 

5. Claims 11,16, 18, 22, 26-27, are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Juang et al., in view of Sharma et al. 

As per claims 1 1 and 16, Juang et al. do not disclose "determining cepstral 
coefficients for the received frames of acoustic data, wherein the determining of the 
segmentation information is based on the determined cepstral coefficients." 

Sharma et al. disclose performing segmentation based on the values of Kmax 
and Kmin that depend on cepstral coefficients, (elems. 20, 22, FIG. 1 and Col. 7, lines 
23-28). Since, K max (maximum number of segments for each frame) is computed 
using SVF (Col. 6, line 29-31 ), which itself is a function of cepstral coefficients (Eq. 2), 
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the segmentation of each frame is ultimately performed based on the cepstral 
coefficients. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to modify Juang et al. as taught by Sharma et al. in order to 
augment the segmentation process of frame data, because the detection of phoneme 
boundaries using cepstral coefficients would improve the speed and accuracy of the 
resulting segmentation process. 

As per claim 18, Juang et al. discloses the method where readjusting scores are 
fed from block 360 back into block 325. (FIG.3) Thus the HMM and corresponding 
scores are updated using the weighting factor Wi (elem. 360, FIG. 3) which reflects the 
level of confidence in the current model. Therefore, the final recognition result (Elem. 
355, FIG. 3) will incorporate the modification through weighting factor W. 

As per claim 21, Juang et al. do not disclose "determining cepstral coefficients for 
the received frames of acoustic data, wherein the determining of the segmentation 
information is based on the determined cepstral coefficients." 

Sharma et al. disclose performing segmentation based on the values of Kmax 
and Kmin that depend on cepstral coefficients, (elems. 20, 22, FIG. 1 and Col. 7, lines 
23-28). Since, K max (maximum number of segments for each frame) is computed 
using SVF (Col. 6, line 29-31 ), which itself is a function of cepstral coefficients (Eq. 2), 



Application/Control Number: 09/826,71 5 Page 1 1 

Art Unit: 2655 

the segmentation of each frame is ultimately performed based on the cepstral 
coefficients. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to modify Juang et al. as taught by Sharma et al. in order to 
augment the segmentation process of frame data, because the detection of phoneme 
boundaries using cepstral coefficients would improve the speed and accuracy of the 
resulting segmentation process. 

As per claim 26-27, Juang et al. do not disclose re-ordering the modified 
recognition hypothesis scores, and further using the re-ordered modified recognition 
hypothesis scores for the recognizing of the patterns in the frames of acoustic data. 

However, it would have been obvious to one of ordinary skill in the art at the time 
the invention was made that modification of HMM scores (training) (as applied to claim 
25) would necessarily involve re-ordering of scores of HMM outcomes. 

6. Claims 31,32 are rejected under 35 U.S.C. 103(a) as being obvious over Muroi 
(4,918,731) 

Muroi does not disclose processing unit for generating frame numbers and 
trainer/HMM decoder. 
Muroi discloses: 

• receiving frames of speech data (12, FIG. 1 ) 
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• the use of end frame numbers to compute duration of the speech pattern 
(phoneme segment) (Col. 7, lines 10-20). The duration of speech pattern is 
itself used in the calculations of weights for HMM state transitions (Col. 5, line 
50). Therefore, frame numbers are used for calculation of weights for HMM 
state transitions. 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made that the system of Muroi necessarily involved a processing unit that 
generated frame numbers and a trainer/HMM decoder for using the frame numbers to 
generate weights, since Muroi's system comprises hardware and software, and thus 
would require modules which would produce the corresponding frames numbers and 
also recognize patterns using weighted HMM transition probabilities. 

Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Fanty et al. (6,535,851) teaches speech segmentation based on cepstral coeffecients 
(6,535,851) 

Ruey-Ching et al., "Improvement in Connected Mandarin Digit Recognition by Explicitly 
Modeling Coarticulatory Information" teaches that the number of peaks in cepstral 
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coefficients remains the same across the frames corresponding to the same segments. 



Naylor et al (5,806,034) teach the system for backtracking in HMMs which relies on 
frame numbers. (FIG. 7) 

Junqua (5,806,030) teaches clustering methods for HMM speech recognizers. 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dmitry Brant whose telephone number is (703) 305- 
8954. The examiner can normally be reached on Mon. - Fri. (8:30am - 5pm). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Talivaldis Ivars Smits can be reached on (703) 306-3011. The fax phone 
number for the organization where this application or proceeding is assigned is (703) 
872-9306. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to Tech Center 2600 receptionist whose telephone 
number is (703) 305- 4700. 



(p. 655) 



DB 




4/7/04 



DORIS H. TO 
SUPERVISORY PATENT EXAMINER 
TECHNOLOGY CENTER 2600 
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